Skip to content

Conversation

@jcrussell
Copy link
Contributor

Searches for "Nullsoft" in the manifest to avoid false positives. Possibly too strict.

Fixes #1249

@qkaiser qkaiser self-assigned this Sep 4, 2025
@qkaiser qkaiser self-requested a review September 4, 2025 07:45
@qkaiser qkaiser added enhancement New feature or request format:executable python Pull requests that update Python code labels Sep 4, 2025
@qkaiser
Copy link
Contributor

qkaiser commented Sep 4, 2025

@jcrussell you should also create integration tests to check that the handler works as expected.

You have to create the following directories:

  • unblob/tests/integration/executable/pe/__input__
  • unblob/tests/integration/executable/pe/__output

I would put the following in the input directory:

  • a normal PE file
  • a normal PE file with prefix and suffix padding
  • a nullsoft PE file
  • a nullsoft PE file with prefix and suffix padding

To generate the output directory content, run the following:

find unblob/tests/integration/executable/pe/__input__ -type f -exec unblob -f -k -e unblob/tests/integration/executable/pe/__output__ {} \;

@qkaiser
Copy link
Contributor

qkaiser commented Sep 29, 2025

@jcrussell any update on this ? do you need assistance ?

@jcrussell
Copy link
Contributor Author

@jcrussell any update on this ? do you need assistance ?

@qkaiser: I believe the code is close to final. Do you mind adding the integration test data? It is easier for me to release code than data. Here's what I have been testing with:

Thanks in advance!

@qkaiser
Copy link
Contributor

qkaiser commented Oct 15, 2025

@jcrussell had to figure out how to handle LFS on forks, looks like it's okay now. Made some adjustments to keep pyright happy given LIEF's ability to return completely different types for the same object.

We need to fix the way the end offset is calculated, it'll probably be based on sections size and header size. Without unblob considers everything after the PE as part of the PE chunk.

@jcrussell
Copy link
Contributor Author

Thanks for moving this along!

We need to fix the way the end offset is calculated, it'll probably be based on sections size and header size. Without unblob considers everything after the PE as part of the PE chunk.

I started looking into this:

>>> pe = lief.PE.parse("tests/integration/executable/pe/__input__/nsis-3.11-setup.exe")
>>> pe.original_size
1564991
>>> pe.sizeof_headers
1024
>>> sum([s.sizeof_raw_data for s in pe.sections])
52224
>>> sum([v.size for v in pe.data_directories])
19456

Found this script that dumps a bunch of info, going to try a more complete look at all the parts tomorrow.

https://github.com/lief-project/LIEF/blob/main/api/python/examples/pe_reader.py

@jcrussell
Copy link
Contributor Author

This works for (some) non-NSIS PEs but trims off the data that NSIS adds after the PE that contains what we actually want to extract. The "trimmed" data is not recognized by any handler. It seems like we need to detect if it's a NSIS installer incalculate_chunk to see if we need to increase the size for the NSIS data.

        size = sum([s.sizeof_raw_data for s in binary.sections]) + binary.sizeof_headers

        return ValidChunk(
            start_offset=start_offset,
            end_offset=start_offset + size
        )   

@qkaiser qkaiser marked this pull request as draft November 25, 2025 12:44
@qkaiser qkaiser added this to the Internship 2026 milestone Nov 25, 2025
@jcrussell
Copy link
Contributor Author

@qkaiser: sorry for the delay, think this is finally fixed:

$ ls tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/
0-16.padding  1565007-1565023.padding  16-1565007.pe  16-1565007.pe_extract
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/0-16.padding 
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/1565007-1565023.padding 
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ sha1sum tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
a64bbad73d4638d668ffdbd0887be7d6528d6a9d  tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
$ sha1sum tests/integration/executable/pe/__input__/nsis-3.11-setup.exe 
a64bbad73d4638d668ffdbd0887be7d6528d6a9d  tests/integration/executable/pe/__input__/nsis-3.11-setup.exe

@jcrussell jcrussell marked this pull request as ready for review December 4, 2025 01:01
…table

Add support for PE file by relying on LIEF to parse PE file once matched
on 'MZ' or 'PE' signature.

If the file is a self-extractable NSIS executable
("Nullsoft.NSIS.exehead" present in manifest) we extract it with 7zip.

Note: the DLL files within MSI extraction directory are no longer
extracted since the PE handler takes care of them. This is an
improvement over the RAR false positive being found in the DLL.

Co-authored-by: Quentin Kaiser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request format:executable python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support NSIS Installers

2 participants